Overview
Brought to you by YData
Dataset statistics
| Number of variables | 20 |
|---|---|
| Number of observations | 2061357 |
| Missing cells | 2654282 |
| Missing cells (%) | 6.4% |
| Duplicate rows | 6725 |
| Duplicate rows (%) | 0.3% |
| Total size in memory | 314.5 MiB |
| Average record size in memory | 160.0 B |
Variable types
| Text | 5 |
|---|---|
| Numeric | 6 |
| Categorical | 5 |
| DateTime | 4 |
| Dataset has 6725 (0.3%) duplicate rows | Duplicates |
arrival_delay_check is highly overall correlated with departure_delay_check | High correlation |
arrival_delay_m is highly overall correlated with departure_delay_m | High correlation |
departure_delay_check is highly overall correlated with arrival_delay_check | High correlation |
departure_delay_m is highly overall correlated with arrival_delay_m | High correlation |
eva_nr is highly overall correlated with long and 2 other fields | High correlation |
info is highly overall correlated with lat and 3 other fields | High correlation |
lat is highly overall correlated with info and 2 other fields | High correlation |
long is highly overall correlated with eva_nr and 2 other fields | High correlation |
state is highly overall correlated with eva_nr and 4 other fields | High correlation |
zip is highly overall correlated with eva_nr and 3 other fields | High correlation |
arrival_delay_check is highly imbalanced (69.8%) | Imbalance |
departure_delay_check is highly imbalanced (69.7%) | Imbalance |
path has 211355 (10.3%) missing values | Missing |
arrival_plan has 211355 (10.3%) missing values | Missing |
arrival_change has 475630 (23.1%) missing values | Missing |
departure_change has 339926 (16.5%) missing values | Missing |
info has 1416016 (68.7%) missing values | Missing |
arrival_delay_m has 1406905 (68.3%) zeros | Zeros |
departure_delay_m has 1338078 (64.9%) zeros | Zeros |
Reproduction
| Analysis started | 2024-11-16 19:02:59.691487 |
|---|---|
| Analysis finished | 2024-11-16 19:04:32.948068 |
| Duration | 1 minute and 33.26 seconds |
| Software version | ydata-profiling vv4.11.0 |
| Download configuration | config.json |
Variables
ID
Text
| Distinct | 2029894 |
|---|---|
| Distinct (%) | 98.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 15.7 MiB |
Length
| Max length | 34 |
|---|---|
| Median length | 33 |
| Mean length | 32.847716 |
| Min length | 28 |
Unique
| Unique | 1998431 ? |
|---|---|
| Unique (%) | 96.9% |
Sample
| 1st row | 1573967790757085557-2407072312-14 |
|---|---|
| 2nd row | 349781417030375472-2407080017-1 |
| 3rd row | 7157250219775883918-2407072120-25 |
| 4th row | 349781417030375472-2407080017-2 |
| 5th row | 1983158592123451570-2407080010-3 |
| Value | Count | Frequency (%) |
| 6858979647319381812-2407090618-14 | 2 | < 0.1% |
| 5820508505174735710-2407121206-10 | 2 | < 0.1% |
| 5798039258756669712-2407040428-9 | 2 | < 0.1% |
| 8529221368551661117-2407100726-11 | 2 | < 0.1% |
| 2977632636429677844-2407040423-14 | 2 | < 0.1% |
| 3460592075112938158-2407141643-10 | 2 | < 0.1% |
| 3544810355449558240-2407040433-3 | 2 | < 0.1% |
| 4747130191045816853-2407081511-20 | 2 | < 0.1% |
| 3476154881632071886-2407081524-14 | 2 | < 0.1% |
| 6289938859164500858-2407040440-4 | 2 | < 0.1% |
| Other values (2029884) | 2061337 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 8495442 | |
| 0 | 8383080 | |
| 2 | 7800385 | |
| 4 | 7210622 | |
| 7 | 6524425 | |
| 3 | 5332792 | |
| - | 5158132 | |
| 8 | 4909314 | |
| 5 | 4840335 | |
| 6 | 4530621 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 67710870 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1 | 8495442 | |
| 0 | 8383080 | |
| 2 | 7800385 | |
| 4 | 7210622 | |
| 7 | 6524425 | |
| 3 | 5332792 | |
| - | 5158132 | |
| 8 | 4909314 | |
| 5 | 4840335 | |
| 6 | 4530621 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 67710870 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1 | 8495442 | |
| 0 | 8383080 | |
| 2 | 7800385 | |
| 4 | 7210622 | |
| 7 | 6524425 | |
| 3 | 5332792 | |
| - | 5158132 | |
| 8 | 4909314 | |
| 5 | 4840335 | |
| 6 | 4530621 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 67710870 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1 | 8495442 | |
| 0 | 8383080 | |
| 2 | 7800385 | |
| 4 | 7210622 | |
| 7 | 6524425 | |
| 3 | 5332792 | |
| - | 5158132 | |
| 8 | 4909314 | |
| 5 | 4840335 | |
| 6 | 4530621 |
line
Text
| Distinct | 296 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 15.7 MiB |
Length
| Max length | 5 |
|---|---|
| Median length | 1 |
| Mean length | 1.607087 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 20 |
|---|---|
| 2nd row | 18 |
| 3rd row | 1 |
| 4th row | 18 |
| 5th row | 33 |
| Value | Count | Frequency (%) |
| 1 | 327551 | 15.9% |
| 2 | 170917 | 8.3% |
| 3 | 167354 | 8.1% |
| 6 | 117441 | 5.7% |
| 5 | 102976 | 5.0% |
| 8 | 98876 | 4.8% |
| 7 | 75570 | 3.7% |
| 4 | 71711 | 3.5% |
| 9 | 65965 | 3.2% |
| 42 | 42461 | 2.1% |
| Other values (284) | 820552 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 643478 | |
| 2 | 401434 | |
| 3 | 316557 | |
| 4 | 283984 | |
| 5 | 272514 | |
| 6 | 263947 | |
| 8 | 206967 | 6.2% |
| 7 | 195939 | 5.9% |
| R | 194774 | 5.9% |
| 9 | 139356 | 4.2% |
| Other values (23) | 393830 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 3312780 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1 | 643478 | |
| 2 | 401434 | |
| 3 | 316557 | |
| 4 | 283984 | |
| 5 | 272514 | |
| 6 | 263947 | |
| 8 | 206967 | 6.2% |
| 7 | 195939 | 5.9% |
| R | 194774 | 5.9% |
| 9 | 139356 | 4.2% |
| Other values (23) | 393830 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 3312780 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1 | 643478 | |
| 2 | 401434 | |
| 3 | 316557 | |
| 4 | 283984 | |
| 5 | 272514 | |
| 6 | 263947 | |
| 8 | 206967 | 6.2% |
| 7 | 195939 | 5.9% |
| R | 194774 | 5.9% |
| 9 | 139356 | 4.2% |
| Other values (23) | 393830 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 3312780 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1 | 643478 | |
| 2 | 401434 | |
| 3 | 316557 | |
| 4 | 283984 | |
| 5 | 272514 | |
| 6 | 263947 | |
| 8 | 206967 | 6.2% |
| 7 | 195939 | 5.9% |
| R | 194774 | 5.9% |
| 9 | 139356 | 4.2% |
| Other values (23) | 393830 |
path
Text
Missing 
| Distinct | 22153 |
|---|---|
| Distinct (%) | 1.2% |
| Missing | 211355 |
| Missing (%) | 10.3% |
| Memory size | 15.7 MiB |
Length
| Max length | 1229 |
|---|---|
| Median length | 626 |
| Mean length | 181.48341 |
| Min length | 4 |
Unique
| Unique | 1118 ? |
|---|---|
| Unique (%) | 0.1% |
Sample
| 1st row | Stolberg(Rheinl)Hbf Gl.44|Eschweiler-St.Jöris|Alsdorf Poststraße|Alsdorf-Mariadorf|Alsdorf-Kellersberg|Alsdorf-Annapark|Alsdorf-Busch|Herzogenrath-August-Schmidt-Platz|Herzogenrath-Alt-Merkstein|Herzogenrath|Kohlscheid|Aachen West|Aachen Schanz |
|---|---|
| 2nd row | Hamm(Westf)Hbf|Kamen|Kamen-Methler|Dortmund-Kurl|Dortmund-Scharnhorst|Dortmund Hbf|Bochum Hbf|Wattenscheid|Essen Hbf|Mülheim(Ruhr)Hbf|Duisburg Hbf|Düsseldorf Flughafen|Düsseldorf Hbf|Düsseldorf-Benrath|Leverkusen Mitte|Köln-Mülheim|Köln Messe/Deutz|Köln Hbf|Köln-Ehrenfeld|Horrem|Düren|Langerwehe|Eschweiler Hbf|Stolberg(Rheinl)Hbf |
| 3rd row | Aachen Hbf |
| 4th row | Herzogenrath|Kohlscheid |
| 5th row | Herzogenrath |
| Value | Count | Frequency (%) |
| hbf | 456947 | 3.4% |
| s)|berlin | 450434 | 3.4% |
| allee|berlin | 163575 | 1.2% |
| berlin | 111434 | 0.8% |
| friedrichstraße | 96045 | 0.7% |
| straße|berlin | 90394 | 0.7% |
| am | 89539 | 0.7% |
| flughafen | 86429 | 0.7% |
| ostkreuz | 85308 | 0.6% |
| rosenheimer | 81857 | 0.6% |
| Other values (13360) | 11560324 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 36026888 | 10.7% |
| r | 26440854 | 7.9% |
| n | 25409619 | 7.6% |
| a | 17843776 | 5.3% |
| | | 17652750 | 5.3% |
| i | 15166799 | 4.5% |
| l | 15129247 | 4.5% |
| t | 14062373 | 4.2% |
| s | 12328613 | 3.7% |
| h | 11798194 | 3.5% |
| Other values (67) | 143885562 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 335744675 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 36026888 | 10.7% |
| r | 26440854 | 7.9% |
| n | 25409619 | 7.6% |
| a | 17843776 | 5.3% |
| | | 17652750 | 5.3% |
| i | 15166799 | 4.5% |
| l | 15129247 | 4.5% |
| t | 14062373 | 4.2% |
| s | 12328613 | 3.7% |
| h | 11798194 | 3.5% |
| Other values (67) | 143885562 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 335744675 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 36026888 | 10.7% |
| r | 26440854 | 7.9% |
| n | 25409619 | 7.6% |
| a | 17843776 | 5.3% |
| | | 17652750 | 5.3% |
| i | 15166799 | 4.5% |
| l | 15129247 | 4.5% |
| t | 14062373 | 4.2% |
| s | 12328613 | 3.7% |
| h | 11798194 | 3.5% |
| Other values (67) | 143885562 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 335744675 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 36026888 | 10.7% |
| r | 26440854 | 7.9% |
| n | 25409619 | 7.6% |
| a | 17843776 | 5.3% |
| | | 17652750 | 5.3% |
| i | 15166799 | 4.5% |
| l | 15129247 | 4.5% |
| t | 14062373 | 4.2% |
| s | 12328613 | 3.7% |
| h | 11798194 | 3.5% |
| Other values (67) | 143885562 |
eva_nr
Real number (ℝ)
High correlation 
| Distinct | 1996 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 8018266.8 |
| Minimum | 8000001 |
|---|---|
| Maximum | 8098360 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 15.7 MiB |
Quantile statistics
| Minimum | 8000001 |
|---|---|
| 5-th percentile | 8000105 |
| Q1 | 8001582 |
| median | 8004136 |
| Q3 | 8010208 |
| 95-th percentile | 8089080 |
| Maximum | 8098360 |
| Range | 98359 |
| Interquartile range (IQR) | 8626 |
Descriptive statistics
| Standard deviation | 31786.593 |
|---|---|
| Coefficient of variation (CV) | 0.0039642724 |
| Kurtosis | 1.0111613 |
| Mean | 8018266.8 |
| Median Absolute Deviation (MAD) | 3054 |
| Skewness | 1.7122051 |
| Sum | 1.652851 × 1013 |
| Variance | 1.0103875 × 109 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 8004128 | 8732 | 0.4% |
| 8089047 | 8312 | 0.4% |
| 8000262 | 7814 | 0.4% |
| 8004132 | 7598 | 0.4% |
| 8004131 | 7382 | 0.4% |
| 8004135 | 7378 | 0.4% |
| 8004129 | 7366 | 0.4% |
| 8004136 | 7324 | 0.4% |
| 8089045 | 7124 | 0.3% |
| 8003368 | 6828 | 0.3% |
| Other values (1986) | 1985499 |
| Value | Count | Frequency (%) |
| 8000001 | 1488 | |
| 8000002 | 823 | < 0.1% |
| 8000004 | 848 | < 0.1% |
| 8000007 | 591 | < 0.1% |
| 8000009 | 829 | < 0.1% |
| 8000010 | 946 | |
| 8000011 | 589 | < 0.1% |
| 8000012 | 896 | < 0.1% |
| 8000013 | 2337 | |
| 8000014 | 756 | < 0.1% |
| Value | Count | Frequency (%) |
| 8098360 | 534 | < 0.1% |
| 8089537 | 2191 | 0.1% |
| 8089474 | 5831 | |
| 8089473 | 1536 | 0.1% |
| 8089472 | 1544 | 0.1% |
| 8089331 | 1690 | 0.1% |
| 8089330 | 1923 | 0.1% |
| 8089329 | 1811 | 0.1% |
| 8089328 | 1941 | 0.1% |
| 8089327 | 2781 |
category
Categorical
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 15.7 MiB |
| 4 | |
|---|---|
| 5 | |
| 3 | |
| 2 | |
| 1 | 70739 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2 |
|---|---|
| 2nd row | 2 |
| 3rd row | 4 |
| 4th row | 5 |
| 5th row | 5 |
Common Values
| Value | Count | Frequency (%) |
| 4 | 788039 | |
| 5 | 643822 | |
| 3 | 421535 | |
| 2 | 137222 | 6.7% |
| 1 | 70739 | 3.4% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 4 | 788039 | |
| 5 | 643822 | |
| 3 | 421535 | |
| 2 | 137222 | 6.7% |
| 1 | 70739 | 3.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| 4 | 788039 | |
| 5 | 643822 | |
| 3 | 421535 | |
| 2 | 137222 | 6.7% |
| 1 | 70739 | 3.4% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 2061357 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 4 | 788039 | |
| 5 | 643822 | |
| 3 | 421535 | |
| 2 | 137222 | 6.7% |
| 1 | 70739 | 3.4% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 2061357 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 4 | 788039 | |
| 5 | 643822 | |
| 3 | 421535 | |
| 2 | 137222 | 6.7% |
| 1 | 70739 | 3.4% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 2061357 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 4 | 788039 | |
| 5 | 643822 | |
| 3 | 421535 | |
| 2 | 137222 | 6.7% |
| 1 | 70739 | 3.4% |
station
Text
| Distinct | 1996 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 15.7 MiB |
Length
| Max length | 42 |
|---|---|
| Median length | 30 |
| Mean length | 14.651004 |
| Min length | 4 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Aachen Hbf |
|---|---|
| 2nd row | Aachen Hbf |
| 3rd row | Aachen-Rothe Erde |
| 4th row | Aachen West |
| 5th row | Aachen West |
| Value | Count | Frequency (%) |
| hbf | 187162 | 5.9% |
| münchen | 63047 | 2.0% |
| main | 62447 | 2.0% |
| frankfurt | 54578 | 1.7% |
| straße | 39439 | 1.2% |
| berlin | 34076 | 1.1% |
| stuttgart | 27326 | 0.9% |
| bad | 27081 | 0.8% |
| köln | 25774 | 0.8% |
| ost | 25336 | 0.8% |
| Other values (2079) | 2644502 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 3529648 | 11.7% |
| r | 2333415 | 7.7% |
| n | 2330307 | 7.7% |
| a | 1806903 | 6.0% |
| t | 1459870 | 4.8% |
| i | 1324360 | 4.4% |
| l | 1291474 | 4.3% |
| s | 1288466 | 4.3% |
| h | 1201107 | 4.0% |
| 1129411 | 3.7% | |
| Other values (53) | 12505988 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 30200949 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 3529648 | 11.7% |
| r | 2333415 | 7.7% |
| n | 2330307 | 7.7% |
| a | 1806903 | 6.0% |
| t | 1459870 | 4.8% |
| i | 1324360 | 4.4% |
| l | 1291474 | 4.3% |
| s | 1288466 | 4.3% |
| h | 1201107 | 4.0% |
| 1129411 | 3.7% | |
| Other values (53) | 12505988 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 30200949 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 3529648 | 11.7% |
| r | 2333415 | 7.7% |
| n | 2330307 | 7.7% |
| a | 1806903 | 6.0% |
| t | 1459870 | 4.8% |
| i | 1324360 | 4.4% |
| l | 1291474 | 4.3% |
| s | 1288466 | 4.3% |
| h | 1201107 | 4.0% |
| 1129411 | 3.7% | |
| Other values (53) | 12505988 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 30200949 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 3529648 | 11.7% |
| r | 2333415 | 7.7% |
| n | 2330307 | 7.7% |
| a | 1806903 | 6.0% |
| t | 1459870 | 4.8% |
| i | 1324360 | 4.4% |
| l | 1291474 | 4.3% |
| s | 1288466 | 4.3% |
| h | 1201107 | 4.0% |
| 1129411 | 3.7% | |
| Other values (53) | 12505988 |
state
Categorical
High correlation 
| Distinct | 16 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 15.7 MiB |
| Nordrhein-Westfalen | |
|---|---|
| Berlin | |
| Bayern | |
| Baden-Württemberg | |
| Hessen | |
| Other values (11) |
Length
| Max length | 22 |
|---|---|
| Median length | 19 |
| Mean length | 10.957078 |
| Min length | 6 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Nordrhein-Westfalen |
|---|---|
| 2nd row | Nordrhein-Westfalen |
| 3rd row | Nordrhein-Westfalen |
| 4th row | Nordrhein-Westfalen |
| 5th row | Nordrhein-Westfalen |
Common Values
| Value | Count | Frequency (%) |
| Nordrhein-Westfalen | 342957 | |
| Berlin | 334845 | |
| Bayern | 330381 | |
| Baden-Württemberg | 253224 | |
| Hessen | 200308 | |
| Hamburg | 154982 | |
| Sachsen | 84791 | 4.1% |
| Niedersachsen | 82767 | 4.0% |
| Rheinland-Pfalz | 78941 | 3.8% |
| Brandenburg | 58961 | 2.9% |
| Other values (6) | 139200 |
Length
| Value | Count | Frequency (%) |
| nordrhein-westfalen | 342957 | |
| berlin | 334845 | |
| bayern | 330381 | |
| baden-württemberg | 253224 | |
| hessen | 200308 | |
| hamburg | 154982 | |
| sachsen | 84791 | 4.1% |
| niedersachsen | 82767 | 4.0% |
| rheinland-pfalz | 78941 | 3.8% |
| brandenburg | 58961 | 2.9% |
| Other values (6) | 139200 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 3540070 | |
| n | 2462639 | 10.9% |
| r | 2331538 | 10.3% |
| a | 1579992 | 7.0% |
| s | 1097165 | 4.9% |
| B | 987839 | 4.4% |
| l | 979365 | 4.3% |
| i | 932853 | 4.1% |
| t | 916588 | 4.1% |
| d | 834133 | 3.7% |
| Other values (25) | 6924268 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 22586450 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 3540070 | |
| n | 2462639 | 10.9% |
| r | 2331538 | 10.3% |
| a | 1579992 | 7.0% |
| s | 1097165 | 4.9% |
| B | 987839 | 4.4% |
| l | 979365 | 4.3% |
| i | 932853 | 4.1% |
| t | 916588 | 4.1% |
| d | 834133 | 3.7% |
| Other values (25) | 6924268 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 22586450 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 3540070 | |
| n | 2462639 | 10.9% |
| r | 2331538 | 10.3% |
| a | 1579992 | 7.0% |
| s | 1097165 | 4.9% |
| B | 987839 | 4.4% |
| l | 979365 | 4.3% |
| i | 932853 | 4.1% |
| t | 916588 | 4.1% |
| d | 834133 | 3.7% |
| Other values (25) | 6924268 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 22586450 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 3540070 | |
| n | 2462639 | 10.9% |
| r | 2331538 | 10.3% |
| a | 1579992 | 7.0% |
| s | 1097165 | 4.9% |
| B | 987839 | 4.4% |
| l | 979365 | 4.3% |
| i | 932853 | 4.1% |
| t | 916588 | 4.1% |
| d | 834133 | 3.7% |
| Other values (25) | 6924268 |
city
Text
| Distinct | 1292 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 15.7 MiB |
Length
| Max length | 25 |
|---|---|
| Median length | 23 |
| Mean length | 8.9921067 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Aachen |
|---|---|
| 2nd row | Aachen |
| 3rd row | Aachen |
| 4th row | Aachen |
| 5th row | Aachen |
| Value | Count | Frequency (%) |
| berlin | 336007 | 13.8% |
| hamburg | 154982 | 6.4% |
| münchen | 118076 | 4.9% |
| main | 87750 | 3.6% |
| am | 82347 | 3.4% |
| frankfurt | 69216 | 2.9% |
| köln | 42898 | 1.8% |
| stuttgart | 41626 | 1.7% |
| düsseldorf | 38329 | 1.6% |
| bad | 28402 | 1.2% |
| Other values (1345) | 1427171 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 2104898 | 11.4% |
| n | 1844843 | 10.0% |
| r | 1622878 | 8.8% |
| a | 1151914 | 6.2% |
| i | 1088569 | 5.9% |
| l | 884751 | 4.8% |
| t | 713605 | 3.8% |
| u | 688424 | 3.7% |
| h | 639238 | 3.4% |
| g | 636523 | 3.4% |
| Other values (50) | 7160299 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 18535942 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 2104898 | 11.4% |
| n | 1844843 | 10.0% |
| r | 1622878 | 8.8% |
| a | 1151914 | 6.2% |
| i | 1088569 | 5.9% |
| l | 884751 | 4.8% |
| t | 713605 | 3.8% |
| u | 688424 | 3.7% |
| h | 639238 | 3.4% |
| g | 636523 | 3.4% |
| Other values (50) | 7160299 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 18535942 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 2104898 | 11.4% |
| n | 1844843 | 10.0% |
| r | 1622878 | 8.8% |
| a | 1151914 | 6.2% |
| i | 1088569 | 5.9% |
| l | 884751 | 4.8% |
| t | 713605 | 3.8% |
| u | 688424 | 3.7% |
| h | 639238 | 3.4% |
| g | 636523 | 3.4% |
| Other values (50) | 7160299 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 18535942 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 2104898 | 11.4% |
| n | 1844843 | 10.0% |
| r | 1622878 | 8.8% |
| a | 1151914 | 6.2% |
| i | 1088569 | 5.9% |
| l | 884751 | 4.8% |
| t | 713605 | 3.8% |
| u | 688424 | 3.7% |
| h | 639238 | 3.4% |
| g | 636523 | 3.4% |
| Other values (50) | 7160299 |
zip
Real number (ℝ)
High correlation 
| Distinct | 1651 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 46279.239 |
| Minimum | 1067 |
|---|---|
| Maximum | 99974 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 15.7 MiB |
Quantile statistics
| Minimum | 1067 |
|---|---|
| 5-th percentile | 7745 |
| Q1 | 18109 |
| median | 47051 |
| Q3 | 70806 |
| 95-th percentile | 88427 |
| Maximum | 99974 |
| Range | 98907 |
| Interquartile range (IQR) | 52697 |
Descriptive statistics
| Standard deviation | 28214.243 |
|---|---|
| Coefficient of variation (CV) | 0.60965226 |
| Kurtosis | -1.3681252 |
| Mean | 46279.239 |
| Median Absolute Deviation (MAD) | 26211 |
| Skewness | 0.045692967 |
| Sum | 9.5398033 × 1010 |
| Variance | 7.9604349 × 108 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 80331 | 22358 | 1.1% |
| 80639 | 13233 | 0.6% |
| 10557 | 12991 | 0.6% |
| 14057 | 11976 | 0.6% |
| 10117 | 11693 | 0.6% |
| 10827 | 11687 | 0.6% |
| 60313 | 11368 | 0.6% |
| 22525 | 9877 | 0.5% |
| 10317 | 9655 | 0.5% |
| 20354 | 9641 | 0.5% |
| Other values (1641) | 1936878 |
| Value | Count | Frequency (%) |
| 1067 | 2458 | |
| 1069 | 2045 | |
| 1097 | 3305 | |
| 1109 | 1800 | |
| 1127 | 597 | < 0.1% |
| 1129 | 1882 | |
| 1159 | 988 | < 0.1% |
| 1187 | 566 | < 0.1% |
| 1219 | 917 | < 0.1% |
| 1237 | 1944 |
| Value | Count | Frequency (%) |
| 99974 | 421 | |
| 99947 | 453 | |
| 99880 | 424 | |
| 99867 | 494 | |
| 99817 | 453 | |
| 99752 | 252 | |
| 99734 | 497 | |
| 99610 | 354 | |
| 99518 | 279 | |
| 99510 | 360 |
long
Real number (ℝ)
High correlation 
| Distinct | 1995 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10.183673 |
| Minimum | 6.070715 |
|---|---|
| Maximum | 14.97908 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 15.7 MiB |
Quantile statistics
| Minimum | 6.070715 |
|---|---|
| 5-th percentile | 6.815137 |
| Q1 | 8.494709 |
| median | 9.944088 |
| Q3 | 12.090548 |
| 95-th percentile | 13.513799 |
| Maximum | 14.97908 |
| Range | 8.908365 |
| Interquartile range (IQR) | 3.595839 |
Descriptive statistics
| Standard deviation | 2.2735246 |
|---|---|
| Coefficient of variation (CV) | 0.22325193 |
| Kurtosis | -1.2261664 |
| Mean | 10.183673 |
| Median Absolute Deviation (MAD) | 1.694189 |
| Skewness | 0.11311638 |
| Sum | 20992186 |
| Variance | 5.1689143 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 11.536537 | 8732 | 0.4% |
| 13.283966 | 8312 | 0.4% |
| 11.604971 | 7814 | 0.4% |
| 11.565619 | 7598 | 0.4% |
| 11.583234 | 7382 | 0.4% |
| 11.575386 | 7378 | 0.4% |
| 11.548572 | 7366 | 0.4% |
| 11.593049 | 7324 | 0.4% |
| 13.451646 | 7124 | 0.3% |
| 6.975001 | 6828 | 0.3% |
| Other values (1985) | 1985499 |
| Value | Count | Frequency (%) |
| 6.070715 | 1744 | |
| 6.07384 | 1213 | |
| 6.074485 | 1051 | |
| 6.091499 | 1488 | |
| 6.094486 | 1899 | |
| 6.097265 | 820 | |
| 6.116475 | 949 | |
| 6.124518 | 818 | |
| 6.203225 | 252 | < 0.1% |
| 6.207467 | 717 | < 0.1% |
| Value | Count | Frequency (%) |
| 14.97908 | 608 | |
| 14.902088 | 272 | < 0.1% |
| 14.805774 | 578 | |
| 14.706775 | 348 | |
| 14.671941 | 461 | |
| 14.658435 | 480 | |
| 14.648866 | 266 | < 0.1% |
| 14.638027 | 267 | < 0.1% |
| 14.578802 | 280 | < 0.1% |
| 14.546496 | 716 |
lat
Real number (ℝ)
High correlation 
| Distinct | 1996 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 50.8824 |
| Minimum | 47.411032 |
|---|---|
| Maximum | 54.906839 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 15.7 MiB |
Quantile statistics
| Minimum | 47.411032 |
|---|---|
| 5-th percentile | 48.111413 |
| Q1 | 49.353291 |
| median | 51.087456 |
| Q3 | 52.478542 |
| 95-th percentile | 53.564711 |
| Maximum | 54.906839 |
| Range | 7.495807 |
| Interquartile range (IQR) | 3.125251 |
Descriptive statistics
| Standard deviation | 1.7922171 |
|---|---|
| Coefficient of variation (CV) | 0.035222731 |
| Kurtosis | -1.1318607 |
| Mean | 50.8824 |
| Median Absolute Deviation (MAD) | 1.42371 |
| Skewness | -0.11825504 |
| Sum | 1.0488679 × 108 |
| Variance | 3.2120421 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 48.142623 | 8732 | 0.4% |
| 52.500737 | 8312 | 0.4% |
| 48.12744 | 7814 | 0.4% |
| 48.139452 | 7598 | 0.4% |
| 48.134202 | 7382 | 0.4% |
| 48.137048 | 7378 | 0.4% |
| 48.141969 | 7366 | 0.4% |
| 48.129168 | 7324 | 0.4% |
| 52.505976 | 7124 | 0.3% |
| 50.940874 | 6828 | 0.3% |
| Other values (1986) | 1985499 |
| Value | Count | Frequency (%) |
| 47.411032 | 221 | < 0.1% |
| 47.44003 | 237 | < 0.1% |
| 47.456591 | 449 | |
| 47.491452 | 419 | |
| 47.513241 | 474 | |
| 47.544341 | 565 | |
| 47.5509 | 368 | |
| 47.552384 | 874 | |
| 47.555857 | 484 | |
| 47.556923 | 610 |
| Value | Count | Frequency (%) |
| 54.906839 | 234 | |
| 54.888814 | 364 | |
| 54.872142 | 371 | |
| 54.861997 | 381 | |
| 54.789605 | 565 | |
| 54.774039 | 281 | |
| 54.685934 | 373 | |
| 54.621166 | 311 | |
| 54.499457 | 517 | |
| 54.4720826 | 537 |
arrival_plan
Date
Missing 
| Distinct | 10084 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 211355 |
| Missing (%) | 10.3% |
| Memory size | 15.7 MiB |
| Minimum | 2024-07-07 23:37:00 |
|---|---|
| Maximum | 2024-07-14 23:59:00 |
departure_plan
Date
| Distinct | 10089 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 15.7 MiB |
| Minimum | 2024-07-08 00:00:00 |
|---|---|
| Maximum | 2024-07-15 00:10:00 |
arrival_change
Date
Missing 
| Distinct | 10122 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 475630 |
| Missing (%) | 23.1% |
| Memory size | 15.7 MiB |
| Minimum | 2024-07-07 23:39:00 |
|---|---|
| Maximum | 2024-07-15 01:03:00 |
departure_change
Date
Missing 
| Distinct | 10118 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 339926 |
| Missing (%) | 16.5% |
| Memory size | 15.7 MiB |
| Minimum | 2024-07-08 00:00:00 |
|---|---|
| Maximum | 2024-07-15 01:04:00 |
arrival_delay_m
Real number (ℝ)
High correlation  Zeros 
| Distinct | 116 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.1765808 |
| Minimum | 0 |
|---|---|
| Maximum | 159 |
| Zeros | 1406905 |
| Zeros (%) | 68.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 15.7 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1 |
| 95-th percentile | 6 |
| Maximum | 159 |
| Range | 159 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 3.4078587 |
|---|---|
| Coefficient of variation (CV) | 2.8964086 |
| Kurtosis | 107.13566 |
| Mean | 1.1765808 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 7.6803854 |
| Sum | 2425353 |
| Variance | 11.613501 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 1406905 | |
| 1 | 255045 | 12.4% |
| 2 | 130459 | 6.3% |
| 3 | 80446 | 3.9% |
| 4 | 46109 | 2.2% |
| 5 | 31440 | 1.5% |
| 6 | 21950 | 1.1% |
| 7 | 15765 | 0.8% |
| 8 | 12246 | 0.6% |
| 9 | 9856 | 0.5% |
| Other values (106) | 51136 | 2.5% |
| Value | Count | Frequency (%) |
| 0 | 1406905 | |
| 1 | 255045 | 12.4% |
| 2 | 130459 | 6.3% |
| 3 | 80446 | 3.9% |
| 4 | 46109 | 2.2% |
| 5 | 31440 | 1.5% |
| 6 | 21950 | 1.1% |
| 7 | 15765 | 0.8% |
| 8 | 12246 | 0.6% |
| 9 | 9856 | 0.5% |
| Value | Count | Frequency (%) |
| 159 | 1 | < 0.1% |
| 157 | 2 | |
| 140 | 1 | < 0.1% |
| 136 | 1 | < 0.1% |
| 134 | 1 | < 0.1% |
| 133 | 3 | |
| 132 | 1 | < 0.1% |
| 120 | 1 | < 0.1% |
| 117 | 1 | < 0.1% |
| 116 | 3 |
departure_delay_m
Real number (ℝ)
High correlation  Zeros 
| Distinct | 121 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.2235736 |
| Minimum | 0 |
|---|---|
| Maximum | 159 |
| Zeros | 1338078 |
| Zeros (%) | 64.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 15.7 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1 |
| 95-th percentile | 6 |
| Maximum | 159 |
| Range | 159 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 3.4183003 |
|---|---|
| Coefficient of variation (CV) | 2.7937023 |
| Kurtosis | 107.25174 |
| Mean | 1.2235736 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 7.6751063 |
| Sum | 2522222 |
| Variance | 11.684777 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 1338078 | |
| 1 | 306206 | 14.9% |
| 2 | 146365 | 7.1% |
| 3 | 81512 | 4.0% |
| 4 | 46542 | 2.3% |
| 5 | 31306 | 1.5% |
| 6 | 21750 | 1.1% |
| 7 | 15715 | 0.8% |
| 8 | 12283 | 0.6% |
| 9 | 9775 | 0.5% |
| Other values (111) | 51825 | 2.5% |
| Value | Count | Frequency (%) |
| 0 | 1338078 | |
| 1 | 306206 | 14.9% |
| 2 | 146365 | 7.1% |
| 3 | 81512 | 4.0% |
| 4 | 46542 | 2.3% |
| 5 | 31306 | 1.5% |
| 6 | 21750 | 1.1% |
| 7 | 15715 | 0.8% |
| 8 | 12283 | 0.6% |
| 9 | 9775 | 0.5% |
| Value | Count | Frequency (%) |
| 159 | 1 | |
| 157 | 1 | |
| 156 | 1 | |
| 137 | 1 | |
| 135 | 1 | |
| 134 | 2 | |
| 133 | 1 | |
| 132 | 2 | |
| 131 | 1 | |
| 120 | 1 |
info
Categorical
High correlation  Missing 
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 1416016 |
| Missing (%) | 68.7% |
| Memory size | 15.7 MiB |
| Information | |
|---|---|
| Störung | |
| Bauarbeiten | |
| Information. (Quelle: zuginfo.nrw) | |
| Bauarbeiten. (Quelle: zuginfo.nrw) | |
| Other values (2) |
Length
| Max length | 34 |
|---|---|
| Median length | 11 |
| Mean length | 16.525872 |
| Min length | 7 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Bauarbeiten. (Quelle: zuginfo.nrw) |
|---|---|
| 2nd row | Information |
| 3rd row | Information |
| 4th row | Information |
| 5th row | Information |
Common Values
| Value | Count | Frequency (%) |
| Information | 244033 | 11.8% |
| Störung | 116325 | 5.6% |
| Bauarbeiten | 96301 | 4.7% |
| Information. (Quelle: zuginfo.nrw) | 78977 | 3.8% |
| Bauarbeiten. (Quelle: zuginfo.nrw) | 72555 | 3.5% |
| Störung. (Quelle: zuginfo.nrw) | 28744 | 1.4% |
| Großstörung | 8406 | 0.4% |
| (Missing) | 1416016 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| information | 323010 | |
| quelle | 180276 | |
| zuginfo.nrw | 180276 | |
| bauarbeiten | 168856 | |
| störung | 145069 | |
| großstörung | 8406 | 0.8% |
Most occurring characters
| Value | Count | Frequency (%) |
| n | 1328903 | 12.5% |
| o | 834702 | 7.8% |
| r | 834023 | 7.8% |
| e | 698264 | 6.5% |
| u | 682883 | 6.4% |
| i | 672142 | 6.3% |
| a | 660722 | 6.2% |
| t | 645341 | 6.1% |
| f | 503286 | 4.7% |
| l | 360552 | 3.4% |
| Other values (18) | 3444005 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 10664823 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| n | 1328903 | 12.5% |
| o | 834702 | 7.8% |
| r | 834023 | 7.8% |
| e | 698264 | 6.5% |
| u | 682883 | 6.4% |
| i | 672142 | 6.3% |
| a | 660722 | 6.2% |
| t | 645341 | 6.1% |
| f | 503286 | 4.7% |
| l | 360552 | 3.4% |
| Other values (18) | 3444005 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 10664823 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| n | 1328903 | 12.5% |
| o | 834702 | 7.8% |
| r | 834023 | 7.8% |
| e | 698264 | 6.5% |
| u | 682883 | 6.4% |
| i | 672142 | 6.3% |
| a | 660722 | 6.2% |
| t | 645341 | 6.1% |
| f | 503286 | 4.7% |
| l | 360552 | 3.4% |
| Other values (18) | 3444005 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 10664823 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| n | 1328903 | 12.5% |
| o | 834702 | 7.8% |
| r | 834023 | 7.8% |
| e | 698264 | 6.5% |
| u | 682883 | 6.4% |
| i | 672142 | 6.3% |
| a | 660722 | 6.2% |
| t | 645341 | 6.1% |
| f | 503286 | 4.7% |
| l | 360552 | 3.4% |
| Other values (18) | 3444005 |
arrival_delay_check
Categorical
High correlation  Imbalance 
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 15.7 MiB |
| on_time | |
|---|---|
| delay | 110953 |
Length
| Max length | 7 |
|---|---|
| Median length | 7 |
| Mean length | 6.8923496 |
| Min length | 5 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | on_time |
|---|---|
| 2nd row | on_time |
| 3rd row | on_time |
| 4th row | on_time |
| 5th row | on_time |
Common Values
| Value | Count | Frequency (%) |
| on_time | 1950404 | |
| delay | 110953 | 5.4% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| on_time | 1950404 | |
| delay | 110953 | 5.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 2061357 | |
| n | 1950404 | |
| o | 1950404 | |
| _ | 1950404 | |
| t | 1950404 | |
| i | 1950404 | |
| m | 1950404 | |
| d | 110953 | 0.8% |
| l | 110953 | 0.8% |
| a | 110953 | 0.8% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 14207593 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 2061357 | |
| n | 1950404 | |
| o | 1950404 | |
| _ | 1950404 | |
| t | 1950404 | |
| i | 1950404 | |
| m | 1950404 | |
| d | 110953 | 0.8% |
| l | 110953 | 0.8% |
| a | 110953 | 0.8% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 14207593 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 2061357 | |
| n | 1950404 | |
| o | 1950404 | |
| _ | 1950404 | |
| t | 1950404 | |
| i | 1950404 | |
| m | 1950404 | |
| d | 110953 | 0.8% |
| l | 110953 | 0.8% |
| a | 110953 | 0.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 14207593 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 2061357 | |
| n | 1950404 | |
| o | 1950404 | |
| _ | 1950404 | |
| t | 1950404 | |
| i | 1950404 | |
| m | 1950404 | |
| d | 110953 | 0.8% |
| l | 110953 | 0.8% |
| a | 110953 | 0.8% |
departure_delay_check
Categorical
High correlation  Imbalance 
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 15.7 MiB |
| on_time | |
|---|---|
| delay | 111348 |
Length
| Max length | 7 |
|---|---|
| Median length | 7 |
| Mean length | 6.8919663 |
| Min length | 5 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | on_time |
|---|---|
| 2nd row | on_time |
| 3rd row | on_time |
| 4th row | on_time |
| 5th row | on_time |
Common Values
| Value | Count | Frequency (%) |
| on_time | 1950009 | |
| delay | 111348 | 5.4% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| on_time | 1950009 | |
| delay | 111348 | 5.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 2061357 | |
| n | 1950009 | |
| o | 1950009 | |
| _ | 1950009 | |
| t | 1950009 | |
| i | 1950009 | |
| m | 1950009 | |
| d | 111348 | 0.8% |
| l | 111348 | 0.8% |
| a | 111348 | 0.8% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 14206803 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 2061357 | |
| n | 1950009 | |
| o | 1950009 | |
| _ | 1950009 | |
| t | 1950009 | |
| i | 1950009 | |
| m | 1950009 | |
| d | 111348 | 0.8% |
| l | 111348 | 0.8% |
| a | 111348 | 0.8% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 14206803 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 2061357 | |
| n | 1950009 | |
| o | 1950009 | |
| _ | 1950009 | |
| t | 1950009 | |
| i | 1950009 | |
| m | 1950009 | |
| d | 111348 | 0.8% |
| l | 111348 | 0.8% |
| a | 111348 | 0.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 14206803 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 2061357 | |
| n | 1950009 | |
| o | 1950009 | |
| _ | 1950009 | |
| t | 1950009 | |
| i | 1950009 | |
| m | 1950009 | |
| d | 111348 | 0.8% |
| l | 111348 | 0.8% |
| a | 111348 | 0.8% |
Interactions
Correlations
| arrival_delay_check | arrival_delay_m | category | departure_delay_check | departure_delay_m | eva_nr | info | lat | long | state | zip | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| arrival_delay_check | 1.000 | 0.433 | 0.031 | 0.910 | 0.414 | 0.086 | 0.086 | 0.096 | 0.102 | 0.114 | 0.107 |
| arrival_delay_m | 0.433 | 1.000 | 0.010 | 0.428 | 0.823 | -0.086 | 0.027 | -0.251 | -0.105 | 0.019 | 0.226 |
| category | 0.031 | 0.010 | 1.000 | 0.032 | 0.011 | 0.163 | 0.127 | 0.176 | 0.175 | 0.209 | 0.158 |
| departure_delay_check | 0.910 | 0.428 | 0.032 | 1.000 | 0.434 | 0.087 | 0.085 | 0.096 | 0.102 | 0.114 | 0.107 |
| departure_delay_m | 0.414 | 0.823 | 0.011 | 0.434 | 1.000 | -0.094 | 0.027 | -0.270 | -0.111 | 0.019 | 0.245 |
| eva_nr | 0.086 | -0.086 | 0.163 | 0.087 | -0.094 | 1.000 | 0.314 | 0.348 | 0.654 | 0.706 | -0.531 |
| info | 0.086 | 0.027 | 0.127 | 0.085 | 0.027 | 0.314 | 1.000 | 0.515 | 0.563 | 0.577 | 0.540 |
| lat | 0.096 | -0.251 | 0.176 | 0.096 | -0.270 | 0.348 | 0.515 | 1.000 | 0.258 | 0.688 | -0.833 |
| long | 0.102 | -0.105 | 0.175 | 0.102 | -0.111 | 0.654 | 0.563 | 0.258 | 1.000 | 0.650 | -0.410 |
| state | 0.114 | 0.019 | 0.209 | 0.114 | 0.019 | 0.706 | 0.577 | 0.688 | 0.650 | 1.000 | 0.696 |
| zip | 0.107 | 0.226 | 0.158 | 0.107 | 0.245 | -0.531 | 0.540 | -0.833 | -0.410 | 0.696 | 1.000 |
Missing values
Sample
| ID | line | path | eva_nr | category | station | state | city | zip | long | lat | arrival_plan | departure_plan | arrival_change | departure_change | arrival_delay_m | departure_delay_m | info | arrival_delay_check | departure_delay_check | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1573967790757085557-2407072312-14 | 20 | Stolberg(Rheinl)Hbf Gl.44|Eschweiler-St.Jöris|Alsdorf Poststraße|Alsdorf-Mariadorf|Alsdorf-Kellersberg|Alsdorf-Annapark|Alsdorf-Busch|Herzogenrath-August-Schmidt-Platz|Herzogenrath-Alt-Merkstein|Herzogenrath|Kohlscheid|Aachen West|Aachen Schanz | 8000001 | 2 | Aachen Hbf | Nordrhein-Westfalen | Aachen | 52064 | 6.091499 | 50.767800 | 2024-07-08 00:00:00 | 2024-07-08 00:01:00 | 2024-07-08 00:03:00 | 2024-07-08 00:04:00 | 3 | 3 | NaN | on_time | on_time |
| 1 | 349781417030375472-2407080017-1 | 18 | NaN | 8000001 | 2 | Aachen Hbf | Nordrhein-Westfalen | Aachen | 52064 | 6.091499 | 50.767800 | NaN | 2024-07-08 00:17:00 | NaN | NaN | 0 | 0 | NaN | on_time | on_time |
| 2 | 7157250219775883918-2407072120-25 | 1 | Hamm(Westf)Hbf|Kamen|Kamen-Methler|Dortmund-Kurl|Dortmund-Scharnhorst|Dortmund Hbf|Bochum Hbf|Wattenscheid|Essen Hbf|Mülheim(Ruhr)Hbf|Duisburg Hbf|Düsseldorf Flughafen|Düsseldorf Hbf|Düsseldorf-Benrath|Leverkusen Mitte|Köln-Mülheim|Köln Messe/Deutz|Köln Hbf|Köln-Ehrenfeld|Horrem|Düren|Langerwehe|Eschweiler Hbf|Stolberg(Rheinl)Hbf | 8000406 | 4 | Aachen-Rothe Erde | Nordrhein-Westfalen | Aachen | 52066 | 6.116475 | 50.770202 | 2024-07-08 00:03:00 | 2024-07-08 00:04:00 | 2024-07-08 00:03:00 | 2024-07-08 00:04:00 | 0 | 0 | NaN | on_time | on_time |
| 3 | 349781417030375472-2407080017-2 | 18 | Aachen Hbf | 8000404 | 5 | Aachen West | Nordrhein-Westfalen | Aachen | 52072 | 6.070715 | 50.780360 | 2024-07-08 00:20:00 | 2024-07-08 00:21:00 | NaN | NaN | 0 | 0 | NaN | on_time | on_time |
| 4 | 1983158592123451570-2407080010-3 | 33 | Herzogenrath|Kohlscheid | 8000404 | 5 | Aachen West | Nordrhein-Westfalen | Aachen | 52072 | 6.070715 | 50.780360 | 2024-07-08 00:20:00 | 2024-07-08 00:21:00 | 2024-07-08 00:20:00 | 2024-07-08 00:21:00 | 0 | 0 | NaN | on_time | on_time |
| 5 | -5293934437045765939-2407080023-2 | 4 | Herzogenrath | 8000404 | 5 | Aachen West | Nordrhein-Westfalen | Aachen | 52072 | 6.070715 | 50.780360 | 2024-07-08 00:30:00 | 2024-07-08 00:31:00 | 2024-07-08 00:30:00 | 2024-07-08 00:31:00 | 0 | 0 | Bauarbeiten. (Quelle: zuginfo.nrw) | on_time | on_time |
| 6 | 6845762881043426854-2407072357-6 | RB33 | Lindern|Geilenkirchen|Übach-Palenberg|Herzogenrath|Kohlscheid | 8000404 | 5 | Aachen West | Nordrhein-Westfalen | Aachen | 52072 | 6.070715 | 50.780360 | 2024-07-08 00:58:00 | 2024-07-08 00:58:00 | NaN | NaN | 0 | 0 | NaN | on_time | on_time |
| 7 | -2100556839975301087-2407072307-13 | 18 | Liège-Guillemins|Bressoux|Vise|Eijsden|Maastricht Randwyck|Maastricht|Meerssen|Valkenburg(NL)|Heerlen|Landgraaf|Eygelshoven Markt|Herzogenrath | 8000404 | 5 | Aachen West | Nordrhein-Westfalen | Aachen | 52072 | 6.070715 | 50.780360 | 2024-07-08 00:37:00 | 2024-07-08 00:41:00 | 2024-07-08 00:37:00 | 2024-07-08 00:41:00 | 0 | 0 | NaN | on_time | on_time |
| 8 | -7696913984968518161-2407080037-1 | 13 | NaN | 8000002 | 3 | Aalen Hbf | Baden-Württemberg | Aalen | 73430 | 10.096271 | 48.841013 | NaN | 2024-07-08 00:37:00 | NaN | 2024-07-08 00:37:00 | 0 | 0 | Information | on_time | on_time |
| 9 | -6027587483204218492-2407080013-4 | 8 | Bremen Hbf|Bremen-Sebaldsbrück|Bremen-Mahndorf | 8000413 | 4 | Achim | Niedersachsen | Achim | 28832 | 9.030447 | 53.015990 | 2024-07-08 00:27:00 | 2024-07-08 00:27:00 | 2024-07-08 01:16:00 | 2024-07-08 01:17:00 | 49 | 50 | NaN | delay | delay |
| ID | line | path | eva_nr | category | station | state | city | zip | long | lat | arrival_plan | departure_plan | arrival_change | departure_change | arrival_delay_m | departure_delay_m | info | arrival_delay_check | departure_delay_check | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2061347 | 2316002367592887267-2407142343-3 | 16 | Ingolstadt Hbf|Ingolstadt Nord | 8003074 | 5 | Ingolstadt Audi | Bayern | Ingolstadt | 85055 | 11.407456 | 48.790496 | 2024-07-14 23:50:00 | 2024-07-14 23:50:00 | NaN | NaN | 0 | 0 | NaN | on_time | on_time |
| 2061348 | -5262352002503319170-2407142138-19 | 16 | Nürnberg Hbf|Schwabach|Roth|Unterheckenhofen|Georgensgmünd|Mühlstetten|Pleinfeld|Ellingen(Bay)|Weißenburg(Bay)|Treuchtlingen|Pappenheim|Solnhofen|Dollnstein|Eichstätt Bahnhof|Adelschlag|Tauberfeld|Eitensheim|Gaimersheim | 8003074 | 5 | Ingolstadt Audi | Bayern | Ingolstadt | 85055 | 11.407456 | 48.790496 | 2024-07-14 23:16:00 | 2024-07-14 23:17:00 | 2024-07-14 23:18:00 | 2024-07-14 23:19:00 | 2 | 2 | NaN | on_time | on_time |
| 2061349 | 1884127837918246080-2407142324-2 | 200 | Wendlingen(Neckar) | 8003983 | 5 | Merklingen - Schwäbische Alb | Baden-Württemberg | Merklingen | 89188 | 9.740877 | 48.521160 | 2024-07-14 23:38:00 | 2024-07-14 23:39:00 | 2024-07-14 23:38:00 | 2024-07-14 23:39:00 | 0 | 0 | Bauarbeiten | on_time | on_time |
| 2061350 | -4498532330426324655-2407142201-14 | RE18 | Osnabrück Hbf|Osnabrück Altstadt|Bramsche|Bersenbrück|Quakenbrück|Essen(Oldb)|Cloppenburg|Ahlhorn|Großenkneten|Huntlosen|Sandkrug|Oldenburg(Oldb)Hbf|Rastede | 8003105 | 5 | Jaderberg | Niedersachsen | Jaderberg | 26349 | 8.184538 | 53.344878 | 2024-07-14 23:55:00 | 2024-07-14 23:56:00 | 2024-07-14 23:55:00 | 2024-07-14 23:56:00 | 0 | 0 | Bauarbeiten | on_time | on_time |
| 2061351 | -5558360799253050120-2407142310-4 | RE18 | Wilhelmshaven|Sande|Varel(Oldb) | 8003105 | 5 | Jaderberg | Niedersachsen | Jaderberg | 26349 | 8.184538 | 53.344878 | 2024-07-14 23:33:00 | 2024-07-14 23:33:00 | 2024-07-14 23:33:00 | 2024-07-14 23:33:00 | 0 | 0 | Bauarbeiten | on_time | on_time |
| 2061352 | -3877986638624297828-2407142237-4 | S9 | Bottrop Hbf|Bottrop-Boy|Gladbeck West | 8002795 | 5 | Herten (Westf) | Nordrhein-Westfalen | Herten | 45699 | 7.139053 | 51.597508 | 2024-07-14 23:17:00 | 2024-07-14 23:17:00 | NaN | NaN | 0 | 0 | NaN | on_time | on_time |
| 2061353 | 3370285438001482281-2407142234-7 | 8 | Lübeck-Travemünde Strand|Lübeck-Travemünde Hafen|Lübeck-Travem. Skandinavienkai|Lübeck-Kücknitz|Lübeck-Dänischburg IKEA|Lübeck Hbf | 8003775 | 5 | Lübeck-Moisling | Schleswig-Holstein | Lübeck | 23560 | 10.629500 | 53.836800 | 2024-07-14 23:10:00 | 2024-07-14 23:11:00 | 2024-07-14 23:11:00 | 2024-07-14 23:12:00 | 1 | 1 | Information | on_time | on_time |
| 2061354 | -8774053210575864323-2407142305-3 | 80 | Bad Oldesloe|Reinfeld(Holst) | 8003775 | 5 | Lübeck-Moisling | Schleswig-Holstein | Lübeck | 23560 | 10.629500 | 53.836800 | 2024-07-14 23:17:00 | 2024-07-14 23:18:00 | 2024-07-14 23:17:00 | 2024-07-14 23:18:00 | 0 | 0 | Information | on_time | on_time |
| 2061355 | -1537118689903044118-2407142354-1 | 11 | NaN | 8001580 | 4 | Düsseldorf Flughafen Terminal | Nordrhein-Westfalen | Düsseldorf | 40474 | 6.766979 | 51.278517 | NaN | 2024-07-14 23:54:00 | NaN | NaN | 0 | 0 | Information. (Quelle: zuginfo.nrw) | on_time | on_time |
| 2061356 | 2862161729195150146-2407142324-1 | 11 | NaN | 8001580 | 4 | Düsseldorf Flughafen Terminal | Nordrhein-Westfalen | Düsseldorf | 40474 | 6.766979 | 51.278517 | NaN | 2024-07-14 23:24:00 | NaN | 2024-07-14 23:24:00 | 0 | 0 | Information. (Quelle: zuginfo.nrw) | on_time | on_time |
Duplicate rows
Most frequently occurring
| ID | line | path | eva_nr | category | station | state | city | zip | long | lat | arrival_plan | departure_plan | arrival_change | departure_change | arrival_delay_m | departure_delay_m | info | arrival_delay_check | departure_delay_check | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -1003145420136048192-2407100551-6 | 51 | Gera Hbf|Hermsdorf-Klosterlausnitz|Stadtroda|Jena-Göschwitz|Jena West | 8010366 | 2 | Weimar | Thüringen | Weimar | 99423 | 11.326458 | 50.991487 | 2024-07-10 06:49:00 | 2024-07-10 07:04:00 | 2024-07-10 06:51:00 | 2024-07-10 07:06:00 | 2 | 2 | NaN | on_time | on_time | 2 |
| 1 | -1008819848758697010-2407130714-22 | 6 | Grafing Bahnhof|Kirchseeon|Eglharting|Zorneding|Baldham|Vaterstetten|Haar|Gronsdorf|München-Trudering|München-Berg am Laim|München Leuchtenbergring|München Ost|München Rosenheimer Platz|München Isartor|München Marienplatz|München Karlsplatz|München Hbf (tief)|München Hackerbrücke|München Donnersbergerbrücke|München Hirschgarten|München-Laim | 8004158 | 2 | München-Pasing | Bayern | München | 81241 | 11.461872 | 48.149852 | 2024-07-13 07:59:00 | 2024-07-13 08:01:00 | 2024-07-13 08:02:00 | 2024-07-13 08:03:00 | 3 | 2 | NaN | on_time | on_time | 2 |
| 2 | -1009540259073221553-2407142134-10 | 6 | Starnberg|Starnberg Nord|Gauting|Stockdorf|Planegg|Gräfelfing|Lochham|München-Westkreuz|München-Pasing | 8004151 | 3 | München-Laim Pbf | Bayern | München | 80639 | 11.503669 | 48.144371 | 2024-07-14 21:59:00 | 2024-07-14 22:00:00 | 2024-07-14 22:01:00 | 2024-07-14 22:02:00 | 2 | 2 | Bauarbeiten | on_time | on_time | 2 |
| 3 | -1010076636343338093-2407101633-11 | 6 | Köln-Worringen|Köln-Blumenberg|Köln-Chorweiler Nord|Köln-Chorweiler|Köln Volkhovener Weg|Köln-Longerich|Köln Geldernstr./Parkgürtel|Köln-Nippes|Köln Hansaring|Köln Hbf | 8003368 | 1 | Köln Messe/Deutz | Nordrhein-Westfalen | Köln | 50679 | 6.975001 | 50.940874 | 2024-07-10 16:59:00 | 2024-07-10 17:00:00 | 2024-07-10 16:59:00 | 2024-07-10 17:00:00 | 0 | 0 | Information. (Quelle: zuginfo.nrw) | on_time | on_time | 2 |
| 4 | -1012813851155274121-2407111424-7 | 18 | Maastricht|Meerssen|Valkenburg(NL)|Heerlen|Landgraaf|Eygelshoven Markt | 8002806 | 3 | Herzogenrath | Nordrhein-Westfalen | Herzogenrath | 52134 | 6.094486 | 50.870916 | 2024-07-11 14:59:00 | 2024-07-11 15:00:00 | NaN | NaN | 0 | 0 | NaN | on_time | on_time | 2 |
| 5 | -1012813851155274121-2407141424-7 | 18 | Maastricht|Meerssen|Valkenburg(NL)|Heerlen|Landgraaf|Eygelshoven Markt | 8002806 | 3 | Herzogenrath | Nordrhein-Westfalen | Herzogenrath | 52134 | 6.094486 | 50.870916 | 2024-07-14 14:59:00 | 2024-07-14 15:00:00 | NaN | NaN | 0 | 0 | NaN | on_time | on_time | 2 |
| 6 | -1014485518442214187-2407080436-7 | 46 | Ilmenau|Ilmenau Pörlitzer Höhe|Ilmenau-Roda|Elgersburg|Geraberg|Martinroda | 8010274 | 5 | Plaue (Thür) | Thüringen | Plaue | 99338 | 10.908698 | 50.778393 | 2024-07-08 04:57:00 | 2024-07-08 05:06:00 | 2024-07-08 04:57:00 | 2024-07-08 05:06:00 | 0 | 0 | NaN | on_time | on_time | 2 |
| 7 | -1014485518442214187-2407090436-7 | 46 | Ilmenau|Ilmenau Pörlitzer Höhe|Ilmenau-Roda|Elgersburg|Geraberg|Martinroda | 8010274 | 5 | Plaue (Thür) | Thüringen | Plaue | 99338 | 10.908698 | 50.778393 | 2024-07-09 04:57:00 | 2024-07-09 05:06:00 | 2024-07-09 04:57:00 | 2024-07-09 05:06:00 | 0 | 0 | NaN | on_time | on_time | 2 |
| 8 | -1014485518442214187-2407100436-7 | 46 | Ilmenau|Ilmenau Pörlitzer Höhe|Ilmenau-Roda|Elgersburg|Geraberg|Martinroda | 8010274 | 5 | Plaue (Thür) | Thüringen | Plaue | 99338 | 10.908698 | 50.778393 | 2024-07-10 04:57:00 | 2024-07-10 05:06:00 | 2024-07-10 04:57:00 | 2024-07-10 05:06:00 | 0 | 0 | NaN | on_time | on_time | 2 |
| 9 | -1014485518442214187-2407120436-7 | 46 | Ilmenau|Ilmenau Pörlitzer Höhe|Ilmenau-Roda|Elgersburg|Geraberg|Martinroda | 8010274 | 5 | Plaue (Thür) | Thüringen | Plaue | 99338 | 10.908698 | 50.778393 | 2024-07-12 04:57:00 | 2024-07-12 05:06:00 | 2024-07-12 04:57:00 | 2024-07-12 05:06:00 | 0 | 0 | NaN | on_time | on_time | 2 |